Documentation of Replication Archive for "Information and Confrontation in Congressional Oversight"


Order:
Running scaling/fit_montgomery.R produces all of the results from the main paper, as well as
	the results from Appendices G, F, and H
Running topics/hearing_topics.py produces the table from Appendix C
Running topics/comp_topic_dist.py produces the figure from Appendix C
Running scaling/bias_analysis.R produces all of the results from Appendix D

Codebooks:
* Codebook.pdf: A codebook that describes the meaning of all of the variables.


Data:
* scaling/audit_comparisons.csv: The results from the survey of the auditing expert.
* scaling/comparisons.csv: The results of the survey.
* scaling/expert_comp.csv: Legislator-level background characteristics for all texts
	coded by experts.
* scaling/is_subcomm_hearing.csv: A hearing-level data set for whether a given
	hearingid corresponds to a subcommittee hearing.
* scaling/member_level_attendance.csv: A member-hearing level dataset of whether
	the given member spoke at the given hearing.
* scaling/partymap.json: The party for every legislator, given their ICPSR ID.
* scaling/ra_comp.csv: Legislator-level background characteristics for all texts
	coded by research assistants.
* topics/expert_hearingids_with_legacy.csv: A mapping from the GPO hearing IDs to the
	legacy hearing IDs used by the policy agendas project.
* topics/manual_hearing_topics.csv: The topic codes of committee hearings that were missing from
	the Policy Agendas Project committee hearing data.
* topics/US-Legislative_congressional_hearings-21.4.csv: Topic codings for committee hearings
	from the Policy Agendas Project.

Code:
* scaling/base_montgomery_model.stan: The stan model uses to run fit_montgomery.R.
* scaling/bias_analysis.R: Produces all of the results from Appendix D
* scaling/fit_montgomery.R: Produces all of the results from the main paper, as well as
	the results from Appendices G, F, and H
* topics/comp_topic_dist.py: Produces the figure from Appendix C
* topics/hearing_topics.py: Produces the table from Appendix C

Software:
* python 3.9.1
* python Packages:
** matplotlib 3.4.3
** pandas 1.3.3
* R 4.3.1
* R Packages:
** betareg 3.1-4
** dplyr 1.1.3
** ggplot2 3.4.4
** lfe 2.9-0
** rjson 0.2.21
** rstan 2.32.3
** stringr 1.5.0


Data Sources:
* scaling/audit_comparisons.csv: See comparisons.csv.

* scaling/comparisons.csv: The comparisons and the rater's party come from an original survey. 
	Hearing IDs and the underlying text used in the survey were drawn from the Government
	Printing Office's website using code provided by Robert Shaffer
	https://github.com/rbshaffer/gpo_tools

* scaling/expert_comp.csv: Professional background data was manually collected by Sujin Kim.
	The remaining variables come from the Center for Effective Lamaking.
	Volden, Craig and Alan Wiseman (2023). Center for Effective Lawmaking, 2023.  
	https://thelawmakers.org/

* scaling/is_subcomm_hearing.csv: This was manually obtained from the text of the hearings.

* scaling/member_level_attendance.csv: The committee ID of the hearing was manually compiled from
	the texts of the hearing, as was the member's attendance and the date of the hearing.  The
	seniority of the member comes from the Center for Effective Lawmaking.  
	Volden, Craig and Alan Wiseman (2023). Center for Effective Lawmaking, 2023.  
	https://thelawmakers.org/
	Committee membership comes from Charles Stewart's congressional data.
	Charles Stewart III and Jonathan Woon.  Congressional Committee Assignments, 
	103rd to 114th Congresses, 1993--2017:

* scaling/partymap.json: This JSON is simply a transformation of the party affiliations from
	voteview.com.  Lewis, Jeffrey B., Keith Poole, Howard Rosenthal, Adam Boche, Aaron Rudkin, 
	and Luke Sonnet (2023). Voteview: Congressional Roll-Call Votes Database. https://voteview.com/

* scaling/ra_comp.csv: See expert_comp.csv.

* topics/expert_hearingids_with_legacy.csv:  These legacyids were manually collected by searching
	the hearingids in congressional.proquest.com

* topics/manual_hearing_topics.csv: These hearings were missing from the Policy Agenda Project and
	were manually coded by Andrew Miller.

* topics/US-Legislative_congressional_hearings-21.4.csv: This comes from the Policy Agendas Project.
	Hearings. The Policy Agendas Project at the University of Texas at Austin, 2017. 
	www.comparativeagendas.net. Accessed August 7, 2023.

